Summarizing Topics: From Word Lists to Phrases

نویسندگان

  • Lauren A. Hannah
  • Hanna M. Wallach
چکیده

In this paper, we present a two-stage approach to generating descriptive phrases from the output of a statistical topic model, such as LDA [4]. First, we propose a Bayesian method for selecting statistically significant phrases from a corpus of documents, using inferred parameter values from LDA. Second, the selected phrases are combined with the topic assignments to make a list of candidate phrases for each topic. These phrases then are ranked in terms of descriptiveness using a metric based on the weighted KL divergence between topic probabilities implied by the phrase and those implied by inferred parameter values from LDA.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Topic Labels

= {Probabilistic topic models are important tools for indexing, summarizing, and analyzing large document collections by their themes. However, promoting end-user understanding of topics remains an open research problem. We compare labels generated by users given four topic visualization techniquesword lists, word lists with bars, word clouds, and network graphsagainst each other and against au...

متن کامل

Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels

= {Probabilistic topic models are important tools for indexing, summarizing, and analyzing large document collections by their themes. However, promoting end-user understanding of topics remains an open research problem. We compare labels generated by users given four topic visualization techniquesword lists, word lists with bars, word clouds, and network graphsagainst each other and against au...

متن کامل

Visualizing Topics with Multi-Word Expressions

We describe a new method for visualizing topics, the distributions over terms that are automatically extracted from large text corpora using latent variable models. Our method finds significant n-grams related to a topic, which are then used to help understand and interpret the underlying distribution. Compared with the usual visualization, which simply lists the most probable topical terms, th...

متن کامل

Applying Word Sketches to Russian

The paper describes work on writing a Russian Sketch grammar for the system Sketch Engine. The objective of such a system is to provide lexicographers with sufficient lexical material and tools for getting information about a word’s collocability and to generate lists of the most frequent phrases for a given word, and then to classify them for appropriate syntactic models. The system will give ...

متن کامل

BlogPulse: Automated Trend Discovery for Weblogs

Over the past few years, weblogs have emerged as a new communication and publication medium on the Internet. In this paper, we describe the application of data mining, information extraction and NLP algorithms for discovering trends across our subset of approximately 100,000 weblogs. We publish daily lists of key persons, key phrases, and key paragraphs to a public web site, BlogPulse.com. In a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014